Multi-Modal Deep Analysis for Multimedia
نویسندگان
چکیده
منابع مشابه
Multi-Modal Multi-Task Deep Learning for Autonomous Driving
Several deep learning approaches have been applied to the autonomous driving task, many employing end-toend deep neural networks. Autonomous driving is complex, utilizing multiple behavioral modalities ranging from lane changing to turning and stopping. However, most existing approaches do not factor in the different behavioral modalities of the driving task into the training strategy. This pap...
متن کاملBuilding Multi-Modal Relational Graphs for Multimedia Retrieval
The abundance of Web 2.0 social media in various media formats calls for integration that takes into account tags associated with these resources. The authors present a new approach to multi-modal media search, based on novel related-tag graphs, in which a query is a resource in one modality, such as an image, and the results are semantically similar resources in various modalities, for instanc...
متن کاملBuilding Multi-Modal Relational Graphs for Multimedia Retrieval
The abundance of Web 2.0 social media in various media formats calls for integration that takes into account tags associated with these resources. The authors present a new approach to multi-modal media search, based on novel related-tag graphs, in which a query is a resource in one modality, such as an image, and the results are semantically similar resources in various modalities, for instanc...
متن کاملDeep Multi-Modal Image Correspondence Learning
Inference of correspondences between images from different modalities is an extremely important perceptual ability that enables humans to understand and recognize crossmodal concepts. In this paper, we consider an instance of this problem that involves matching photographs of building interiors with their corresponding floorplan. This is a particularly challenging problem because a floorplan, a...
متن کاملHigh-order Deep Neural Networks for Learning Multi-Modal Representations
In multi-modal learning, data consists of multiple modalities, which need to be represented jointly to capture the real-world ’concept’ that the data corresponds to (Srivastava & Salakhutdinov, 2012). However, it is not easy to obtain the joint representations reflecting the structure of multi-modal data with machine learning algorithms, especially with conventional neural networks. This is bec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Circuits and Systems for Video Technology
سال: 2020
ISSN: 1051-8215,1558-2205
DOI: 10.1109/tcsvt.2019.2940647